Meta Info
Homepage: https://www.usenix.org/conference/osdi25
Papers
LLM Inference
[2412.17246] Fast and Live Model Auto Scaling with O(1) Host Caching
Huawei & SJTU IPADS
Serverless Computing, Model Autoscaling
[2502.04563] WaferLLM: A Wafer-Scale LLM Inference System
Edinburgh(Luo Mai) & Microsoft
wafer-scale LLM inference system
GPU Sharing
Preemptive Scheduling for Diverse XPUs using Multi-level Hardware Model
SJTU IPADS
Resource Allocation
[2412.11447] Zeal: Rethinking Large-Scale Resource Allocation with "Decouple and Decompose"